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20.  Abstract  continued. 


Wiener  process,  a  >  0.  All  bounded  by  unity,  measurable  and 

nonant  ic  ipat  ive  functionals  u  (x)  of  the  state  process  (x  ) 
are  admissible  as  controls.  It  is  proved  that  t lie  optimal  law  is 

of  the  form 


Uj(x)  -  -1,  xt  >  b 

-  0,  | x t |  i  b 


xf  <  *b 


for  some  switching  point  b  >  0,  characterized  in  terms  of  the 
function  $(•)  through  a  transcendental  equation. 
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OPTIMAL  DISCOUNTED  LINEAR  CONTROL  OF  THE 


WIENER  PROCESS 


ABSTRACT 


The  following  stochastic  control  problem  is  considered:  to 
minimize  the  discounted  expected  total  cost 


subject  to  dx{  ■  ut(x)dt  ♦  dw^ ,  xQ  ■  x;  | u^  <  1 ,  (w()  a  Wiener 
process,  a  >  0.  All  bounded  by  unity,  measurable  and  nonanticipative 
functionals  ut(x)  of  the  state  process  ( x ^ )  are  admissible  as 
controls.  It  is  proved  that  the  optimal  law  is  of  the  form 


for  some  switching  point  b  >  0,  characterized  in  terms  of  the 

function  ♦(•)  through  a  transcendental  equation. 
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1.  INTRODUCTION  AND  SUMMARY 


We  consider  the  problem  of  discounted  optimal  control  of  the 


Wiener  process  — t  ransformed  by  the  action  of  a  non 
anticipative  control  into  the  ‘"state  process^  ( ;  t  >  Th 
latter  satisfies  the  '''state  equation'''  u  (x)dt  ♦  dw  ,  t  > 


x^  on  an  appropriate  probability  space. ~ 

w>kThere  is  a  cost  ■  ♦  ( kv  per  unit  time  for  being  in  the  wrong 
state  where  ♦( ■ ) >  is  an  even,  uniformly  convex  function  on  the 


reals  whose  second  derivative  is  decreasing  with  distance  from  the 
origin.  There  is  also  a  cost  ~k£  per  unit  time  for  using  the 


Both  costs  arc  discounted  in  time  by  t-We  factor 


The  controller  has  to  choose  a  law 


as  a  nonant ic ipat ive 


measurable  functional  of  the  state  process  with  values  1*1,1] 

so  as  to  minimize  the  expected  discounted  total  cost.  -  ,  ^  *. 

The  "physically  obvious"  law  is  to  push  with  full  force  (to 
the  right  direction)  if  x  is  outside  a  certain  neighbourhood 


of  the  origin,  while  to  exert  no  control  at  all  if  x  is  in  this 


neighbourhood 


Optimality  of  this  law  is  proved  and  the  cutoff  point  b 
separating  the  active  region  from  the  dead  zone  is  characterized 
in  terms  of  the  function  ♦(•)  through  the  transcendental 


-  2  — 


equation  (3.11).  Existence  and  uniqueness  of  a  solution  to  the 
above  equation  is  proved  by  making  use  of  the  aforementioned  properties 
of  ♦(•)•  It  is  an  interesting  problem  to  relax  these  assumptions 
in  order  to  allow  cost  functions  with  general  polynomial  or  even 
exponential  growth. 

General  existence  results  for  the  problem  of  discounted 
stochastic  control  were  given  by  Kushner  [1967],  Benes,  Shepp 
and  Witsenhausen  (1979)  proved  optimality  of  the  bang-bang  law  in 
the  case  of  a  quadratic  running  cost  on  the  state  and  no  cost  on 
the  control.  They  also  treated  the  finite-fuel  problem  with  a 
discounted  cost  criterion. 

In  the  present  paper  we  proceed  by  formulating  the  control 

problem, in  Section  2. ~  The  Bellman  equation  of  dynamic  programming 

is  explicitly  in  1  veil  i  n  Sect  ion  ^  and  the  candidate  for  the  optimal 

law  discerned  from  the  properties  of  the  solution. 

Optimality  of  the  candidate  is  proved, in  Section  4. 

c~ 


K1Y  WORDS  AND  PHRASES:  Discounted  stochastic  control,  Bellman 

equation,  dead-zone  controllers 


2.  THE  STOCHASTIC  CONTROL  PROBLEM 

Consider  as  basic  probability  space  Q  the  space  CQR  )  of 
continuous,  real-valued  functions  on  1R  and  let  t  >  0 

denote  the  o-field  generated  by  {xs;  s  <  t),  x  £  fl.  Consider 
also  the  o-field  1  generated  by  the  subsets  M  of  F*  x  CQR*) 
with  the  property  that  each  t-section  of  M  belongs  to  3^ 

and  each  x-section  of  M  is  Lebesgue  measurable.  A  function 

g  defined  on  1R*  x  CQR*)  is  /-measurable  if  and  only  if  g(t,-) 
is  .^-measurable  for  any  t  >  0  and  g(’,x)  is  Lebesgue 
measurable,  for  any  x  €  CQR*). 

Definition  2,1:  Let  the  control  measure  space  be  the  compact  inter¬ 
val  (-1,1]  with  its  Borel  sets.  An  admissible  nonant ic ipat ive 
control _ u  is  a  measurable  function  u:  (F  *  C(F*))  (-1,1]. 

The  class  of  all  such  controls  is  denoted  by  “Sr.  For  any  con¬ 
trol  law  u  £  ^  and  any  x  €  1R  we  can  construct  by  means  of  the 
Girsanov  theorem  a  probability  space  (ft.  .^,P)  and  a  pair  of 
stochastic  processes  on  it,  such  that  (w^ ;  t  >  0}  is 

a  Wiener  process  with  respect  to  P  and  the  stochastic  differential 
equation 

(2.1)  dxt  •  ut(x)dt  ♦  dwt,  t  >  0 

(2.2)  .  *0  "  * 


is  satisfied.  Such  a  "weak  solution"  of  (2.1)  is  known  to  be  unique 


4 

in  the  sense  of  the  probability  law;  see,  for  instance,  Liptser 
and  Shiryayev  (1977). 

Consider  now  a  nonnegative  function  $  on  the  reals  which 
is  even,  C4-®!),  uiiformly  convex  in  the  sense  that 

(2.3)  0  <  k  <  $(x)  <  K,  all  x  €  R 

for  some  positive  constants  k,K,  with  ♦  (x)  decreasing  on  x  > 

The  control  problem  consists  in  finding  a  law  u*  €  V  that 
minimizes  the  "discounted"  expected  total  cost 

(2-4)  J(x;u)  -  E  f  e'ut(|u  (x ) |  ♦  *(x,))dt 

1 0  1  l 

of  starting  at  place  x  and  using  control  u,  over  all  u  € 
x  €  1R.  Here  E  denotes  expectation  with  respect  to  the 
probability  measure  p,  a  >  o  is  the  "discount  factor",  * ( • )  is 
the  running  cost  on  the  state  and  (•(  is  the  cost  of  control. 
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>.  THE  EQUATION  OF  DYNAMIC  PROG RAMMING 

The  method  proceeds  by  constructing  a  solution  to  the  Bellman 
equation  of  dynamic  programming  which  satisfies  certain  growth 
and  symmetry  conditions.  Introduce  the  function 

(3.1)  a (p)  -  min  (|u|*pu)  -  0  ,  |p|  <  1 

|u|<l 

-  1  *  I P I .  I P I  >  1* 

The  formal  Bellman  equation  for  this  problem  is 

(3.2)  av  -  \  vxx  ♦  a(vx)  ♦  $(x) ,  x  €  K- 

Kc  arc  looking  for  a  positive  number  b  and  an  even  solution  of 

(3.2)  v(x)  -  0(x")  as  |x|  -►  - ,  such  that  vx(b)  *  1  and 

(3.3)  av  ■  i  vxx  ♦  ♦(x),  0  <v  (x)  <1  on  0  <  x  <  b 

(3.4)  av  ■  i  v  ♦l-v  ♦  ♦(x),  vw(x)  >1  on  x  >  b. 

i  XX  X  X 

A  particular  solution  to  the  equation  in  (3.3)  is  given  by 
the  cost  of  "doing  nothing"  all  the  time.  Indeed,  consider  the 
"naive”  control  law  ut(x)  =  0.  The  corresponding  cost  is 

p(x)  ■  E  ]  c'aS(x»w  )dt 
0 

and  it  becomes  an  easy  exercise  in  Laplace  transforms  to  verify 
that 


This  function  is  even,  has  the  growth  of  ♦  (•)  as  |x|  -*•  «  and 
satisfies  the  equation  in  (3.3)  as  is  easily  verified.  To  get 
the  general  solution  of  the  latter,  we  add  to  p(x)  a  solution 


V 


x/2a 


V 


-x/Iu 


of  the  homogeneous  »v  -  \  v  .  Since  v(-)  has  to  be  even,  and 

b  XX 

consequently  vx(0)  ■  0,  Aj  ■  A,  ■  j  .  So 


(3.6)  v (x )  -  A  cosh(x/2a)  ♦  p(x),  on  0  <  x  <  h. 


Condition  vx(b)  “  1  then  implies 


(3.7) 


A 


P’Cb)  -  1 
/2a*  sinh(b/2») 


Similarly,  a  particular  solution  to  (3.4)  is  obtained  by 
considering  the  cost  corresponding  to  the  naive  law  ut(x)  i  -1 
of  pushing  with  full  force  to  the  left  all  the  time: 


q  (x )  &  £  [  e*0t(l  ♦  »(x-t*w  ))dt 
•  0 


and  is  easily  verified  that,  if  6  ^  /\*2a  *  1, 


solves  the  equation  in  (3.4)  and  has  the  growth  of  4>  ( • )  as  |x| 
To  get  the  general  solution  of  (3.4)  one  has  to  add  to  q(x)  the 
general  solution 


Be‘6x  ♦ 

of  the  corresponding  homogeneous  equation  oy  »  ^  \'xx  -  vx  (note 
that  2  ♦  fl  and  -  6  are  the  roots  of  the  characteristic  polynomial 
s2  -  2s  -  2“).  The  growth  condition  implies  -  0,  so 

(3.9)  v ( x )  -  Be  6x  ♦  q  (x )  ,  on  x  >  h. 


where 

(3.10) 


because  of  vx(b)  -  1.  Matching  the  values  of  v(-)  from  the  two 
sides  at  x  *  b  gives  the  equation  for  the  switching  point  b: 


(3.11)  tanh(b/2a) 


p(’)  and  q  C* )  being  the  functions  in  (3.5),  (3.8). 


Thus,  there  exists  a  unique  number  >  0,  such  that  mj(bj)  ■  0. 

On  the  other  hand,  since  B  <  <  2  ♦  B,  mj(x)  >  m2(x)  and 
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m\(x)  ■  g 

on  x  >  0,  by  the  assumption  of  decreasing  curvature.  So  there 
exists  a  unique  number  b.,  >  bj  ,  such  that  m,(bj  •  0.  Now  the 
function  m(x)  is  negative  on  (0,bj)  and  (h-,,"0),  is  equal  to 
zero  at  bj,  and  increases  montonically  to  infinity  on  (bj,b7) 
as  x  ♦  b. . 

Consequently,  there  exists  a  unique  b  €  (bj.b^)  such  that: 
tanh(b.Ta)  -  ’ m(b) ,  q.e.d. 

Once  b  has  been  thus  determined,  one  constructs  the  function 

(3.12)  v(x)  ■  - - —  £— Ul) —  cosh(x/Ia)  ♦  p(x)  ;  0  <  x  <  b 

/la  •  sinh ( b/Ia ) 

.  »  e*6(x*b)  .  q(x)  .  x  >  b 

-  v(-x)  ;  x  <  0 


$"(z)e'(2*e)(2'x)dz  ♦  f (♦"(z-x)  -  $”(z*x))e'2''^  dz 
t  Jo 


>  k  >  0 


in  accordance  with  (3.b),  (3.7),  (3.9),  (3.10),  where  p(x)  and 
q(x)  arc  again  the  functions  in  (3.S)  and  (3.8).  The  function 
v ( • )  in  (3.12)  satisfies  equations  (3.3)  and  (3.4)  on  (a,b) 

and  (b,«")  respectively,  as  well  as  v^fb)  *  1,  by  construction, 
and  v(b4)  •  v(b  ),  by  (3.11). 

It  remains  to  prove  that  v(-)  solves  the  Bellman  equation 

t 

(3.2).  Suffices  to  prove:  0  <  vx(x)  5  1*  on  (0,b)  and 


v’x(x)  >  1  ,  on  lb,”),  while  in  turn  this  is  an  easy  corollary 
of  convexity: 

Proposition  3.2.  The  function  in  (3.12)  is  convex:  \'xx(x)  ■  °* 
x  €  F. 

Proof .  A  bit  of  algebra  shows  that,  on  0  <  x  <  b, 

’  vxxtx)  "  av(x)  *  *(x)  -  31  j^V'U)e  *-*P)“dz 

♦  (>*p(x)  -  ♦(x))  -  (ap(b)  •  $(b)). 

Note  that:  up'(x)  -  ♦'(x)  ■  |  [  (♦"(z*x)  -  ♦”(z*x))e  <  o,  by 

L  J  0 

the  decreasing  second  derivative  assumption.  Therefore,  up(x)  *  #(x) 
>  ap (b )  -  ♦  (b )  and 

(3.13)  vxx(x)  S  ’  0  <  x  <  b. 

By  continuity  of  v  (•);  v  (x)  >  0  on  (b,b*c),  c>0  sufficiently 

X  A  X  A 

small.  On  the  other  hand,  if  w  ■  vxx: 

wxx  *  2wx  *  2aw  "  *^"00  <  °*  on  (b,»). 

By  the  maximum  principle  (Friedman  (1964),  p.  53,  Theorem  18), 
vxx(*)  cannot  have  a  negative  minimum  on  (b,«).  However,  on 
this  interval. 


V  (x)  -  eCq'Cb)-l)c*P(x'b)  ♦  q"(x)  >  fi(q’ Cb)-l)e'eU'b)  ♦  £  . 


since 


txj  •  ^  p2>e»«  {Vu..12*1 


e)ldi  •  e'6x 


j"  *"U)e62dzl 


Therefore,  vxx(x)  >  0  for  x  sufficiently  large,  so  if  vxxlx) 
some  x  €  (b,®),  vxx(-)  would  have  a  negative  minimum  there, 
contradicting  the  maximum  principle.  Therefore  vxx(x)  «  0, 
also  on  lb*t,«),  q.c.d. 


Special  Case:  In  the  special  case  ♦  (x)  •  x*",  we  have 


P(x)  -  V  ♦  -7  •  ^x)  "  V  *  1  *  ♦  ^r- 


,  .  a  /»  ,a  1  „  o  ,  a  1 

mt(x)  -  x  -  j  ,  m,(x)  -  x  -  U  ♦  j)  ,  so  It  can 

be  shown  that  v__(x)  >  >  o,  any  x  €  F,  in  this  case. 
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4.  THE  OPTIMAL  LAW 

Let  us  prove  that  v(-)  is  the  value  function  of  the  control 
problem,  i.e.  an  (attainable)  lower  bound  on  the  expected  discounted 
total  cost,  and  try  to  discern  the  law  that  achieves  this  infimum. 
Consider  any  admissible  law  u  €  ^  along  with  the  corresponding 
state  process  (x*1),  solving  equation  (2.1)-(2.2)  in  the  weak  sense. 
Ke  introduce  the  process 

(4.1)  Vj  $  v(xJ)e'Ut 

and  note  that 

F.v(x“)  <  Ev  C I  x  |  ♦  t  ♦  I  w  |  )  -  0(t2)  as  t  -  so  lim  EVj'  -  0, 

t  ■  t  t  -MB  L 

any  o>0,  x€H,  u€  ‘Sk. 

Applying  Ito's  rule  to  (4.1)  we  get 

VT  ’  v(x)  4  |ne'Jt(‘av(x^  4  I  vxx(xt}  4  utvx(xt}  4  |ut!  4 

X  X 

♦  ♦(xlt,))dt  -  |oe'at(|ut(xu)  I  ♦  Hxjndt  ♦  |oe'atvxC*t)dwt 

Since  v(*)  satisfies  equation  (3.2),  the  first  integrand  is 
nonnegativc.  Taking  expectations  and  then  passing  to  the  limit 
as  T  ♦  •  we  get 

*  >• 

J(x;u)  ■  F  j  e  ( |ut  (xU)  |  ♦  ♦  (x^)  )dt  >  v(x) ;  u  €  ‘9f,  x  €  F 


% 


(4.2) 


13 


Consider  now  the  law  u*  €  ‘'k 


(1.1) 


uj(x) 


u  € 

% 

-1, 

xt  >  b 

o. 

l*tl  5  b 

1, 

x> 

1 

V 

4J 

X 

obtained  through  the  minimization:  |u*(x)|  ♦  vx(xt)-ut(x)  ■  a(vx(xt)) 
The  corresponding  state  process  (x*)  satisfies 


(4.4) 


dx*  -  u*(x*)dt  ♦  dw^;  t  >  0 


on  an  appropriate  probability  space.  Although  no  explicit  use  is 
made  of  this  fact  in  the  present  context,  we  mention  that  (4.4) 
is  strongly  solvable  for  x*  as  a  causal  functional  of  w 
because  u*(x)  in  (4.3)  is  instantaneous,  bounded  and  measurable; 
sec  Zvonkin  (1974].  Then  the  inequalities  above  hold  as  equalities, 
and 


(4.S) 


J(x;u*)  -  v(x) ;  x  €  1R. 


From  (4.2),  (4.S),  v(x)  is  a  lower  bound  on  the  performance  (2.4), 
and  is  achieved  by  the  process  (x*).  In  other  words,  u*(x)  is 
optimal. 
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